Predicting the quality of questions on Stackoverflow

نویسندگان

  • Antoaneta Baltadzhieva
  • Grzegorz Chrupala
چکیده

Community Question Answering websites (CQA) have a growing popularity as a way of providing and searching of information. CQA attract users as they provide a direct and rapid way to find the desired information. As recognizing good questions can improve the CQA services and the user’s experience, the current study focuses on question quality instead. Specifically, we predict question quality and investigate the features which influence it. The influence of the question tags, length of the question title and body, presence of a code snippet, the user reputation and terms used to formulate the question are tested. For each set of dependent variables, Ridge regression models are estimated. The results indicate that the inclusion of terms in the models improves their predictive power. Additionally, we investigate which lexical terms determine high and low quality questions. The terms with the highest and lowest coefficients are semantically analyzed. The analysis shows that terms predicting high quality are terms expressing, among others, excitement, negative experience or terms regarding exceptions. Terms predicting low quality questions are terms containing spelling errors or indicating off-topic questions and interjections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Tags for StackOverflow Questions

We present a system that is able to automatically assign tags to questions from the question-answering site StackOverflow. Our system consists of a programming language detection system and a SVM using content-based features. When testing on an unseen test set, we achieve a mean F1 of 0.41 on this task.

متن کامل

Predict Closed Questions on StackOverflow

”Millions of programmers use StackOverflow to get high quality answers to their programming questions every day. There has evolved an effective culture of moderation to safe-guard it. More than six thousand new questions is asked on StackOverflow1 every weekday. Currently about 6% of all new questions end up ”closed”. The goal of this paper is to build a classifier that predicts whether or not ...

متن کامل

Improving the Retrieval of Related Questions in StackOverflow

OF THE THESIS Improving the Retrieval of Related Questions in StackOverflow By Rezvan Ghaderi MASTER OF SCIENCE in Information and Computer Science University of California, Irvine, 2015 Professor Cristina V. Lopes, Chair StackOverflow is a very popular Q&A website, known to all software developers. Developers can either post their coding questions on the website to be answered by other develop...

متن کامل

Predicting Programming Community Popularity on StackOverflow from Initial Affiliation Networks

StackOverflow has become a popular question and answer site for programmers since its launch in 2008. As new programming frameworks and languages emerge, StackOverflow communities form around the new tags to ask and answer questions. Our analysis investigated both the a liation network between tags across the lifetime of StackOverflow, as well as the relationship between the initial a liation n...

متن کامل

On code reuse from StackOverflow: An exploratory study on Android apps

Context: Source code reuse has been widely accepted as a fundamental activity in software development. Recent studies showed that StackOverflow has emerged as one of the most popular resources for code reuse. Therefore, a plethora of work proposed ways to optimally ask questions, search for answers and find relevant code on StackOverflow. However, little work studies the impact of code reuse fr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015